Comments on : “ Model Complexity Control for Regression Using VC

نویسندگان

  • Giorgio Corani
  • Marino Gatto
چکیده

In [1], various model selection approaches were experimentally inter-compared; one of the considered model selection criteria was the Schwarz Information Criterion (SIC); however, SIC was incorrectly implemented. The same mistake has been repeated in other more recent papers. Here, we show why the SIC formula originally employed was wrong. We report instead the correct approach, which is well-known in statistics literature. We then show that the SIC performance is far better than the one described in [1], by repeating several experiments of the original paper. Nevertheless, we confirm that VC-based model selection is more powerful than SIC, especially for small samples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-dimensional classification by sparse logistic regression

We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic bounds for the resulting misclassification excess risk. The bounds can be reduced under the additional low-noise condition. The proposed complexity penalty ...

متن کامل

Comparison of Model Selection for Regression

We discuss empirical comparison of analytical methods for model selection. Currently, there is no consensus on the best method for finite-sample estimation problems, even for the simple case of linear estimators. This article presents empirical comparisons between classical statistical methods - Akaike information criterion (AIC) and Bayesian information criterion (BIC) - and the structural ris...

متن کامل

Measuring The VC - dimension Using OptimizedExperimental

VC-dimension is the measure of model complexity (capacity) used in VC-theory. The knowledge of the VC-dimension of an estima-2 tor is necessary for rigorous complexity control using analytic VC generalization bounds. Unfortunately, it is not possible to obtain the analytic estimates of the VC-dimension in most cases. Hence, it has been recently proposed to measure the VC-dimension of an estimat...

متن کامل

Investigating the relationship among complexity, range, and strength of grammatical knowledge of EFL students

Assessment  of  grammatical  knowledge  is  a  rather  neglected  area  of  research  in  the  field with  many  open  questions  (Purpura,  2004).  The  present  research  incorporates  recent proposals  about  the  nature  of  grammatical  development  to  create  a  framework  consisting of dimensions of complexity, range and strength, and studies which dimension(s) can best predict the stat...

متن کامل

Penalty Functions for Genetic Programming Algorithms

Very often symbolic regression, as addressed in Genetic Programming (GP), is equivalent to approximate interpolation. This means that, in general, GP algorithms try to fit the sample as better as possible but no notion of generalization error is considered. As a consequence, overfitting, code-bloat and noisy data are problems which are not satisfactorily solved under this approach. Motivated by...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006